278 research outputs found

    Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution

    Full text link
    The standard approach to analyzing 16S tag sequence data, which relies on clustering reads by sequence similarity into Operational Taxonomic Units (OTUs), underexploits the accuracy of modern sequencing technology. We present a clustering-free approach to multi-sample Illumina datasets that can identify independent bacterial subpopulations regardless of the similarity of their 16S tag sequences. Using published data from a longitudinal time-series study of human tongue microbiota, we are able to resolve within standard 97% similarity OTUs up to 20 distinct subpopulations, all ecologically distinct but with 16S tags differing by as little as 1 nucleotide (99.2% similarity). A comparative analysis of oral communities of two cohabiting individuals reveals that most such subpopulations are shared between the two communities at 100% sequence identity, and that dynamical similarity between subpopulations in one host is strongly predictive of dynamical similarity between the same subpopulations in the other host. Our method can also be applied to samples collected in cross-sectional studies and can be used with the 454 sequencing platform. We discuss how the sub-OTU resolution of our approach can provide new insight into factors shaping community assembly.Comment: Updated to match the published version. 12 pages, 5 figures + supplement. Significantly revised for clarity, references added, results not change

    Data hosting infrastructure for primary biodiversity data

    Get PDF
    © The Author(s), 2011. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in BMC Bioinformatics 12 Suppl. 15 (2011): S5, doi:10.1186/1471-2105-12-S15-S5.Today, an unprecedented volume of primary biodiversity data are being generated worldwide, yet significant amounts of these data have been and will continue to be lost after the conclusion of the projects tasked with collecting them. To get the most value out of these data it is imperative to seek a solution whereby these data are rescued, archived and made available to the biodiversity community. To this end, the biodiversity informatics community requires investment in processes and infrastructure to mitigate data loss and provide solutions for long-term hosting and sharing of biodiversity data. We review the current state of biodiversity data hosting and investigate the technological and sociological barriers to proper data management. We further explore the rescuing and re-hosting of legacy data, the state of existing toolsets and propose a future direction for the development of new discovery tools. We also explore the role of data standards and licensing in the context of data hosting and preservation. We provide five recommendations for the biodiversity community that will foster better data preservation and access: (1) encourage the community's use of data standards, (2) promote the public domain licensing of data, (3) establish a community of those involved in data hosting and archival, (4) establish hosting centers for biodiversity data, and (5) develop tools for data discovery. The community's adoption of standards and development of tools to enable data discovery is essential to sustainable data preservation. Furthermore, the increased adoption of open content licensing, the establishment of data hosting infrastructure and the creation of a data hosting and archiving community are all necessary steps towards the community ensuring that data archival policies become standardized

    A Benchmark of Parametric Methods for Horizontal Transfers Detection

    Get PDF
    Horizontal gene transfer (HGT) has appeared to be of importance for prokaryotic species evolution. As a consequence numerous parametric methods, using only the information embedded in the genomes, have been designed to detect HGTs. Numerous reports of incongruencies in results of the different methods applied to the same genomes were published. The use of artificial genomes in which all HGT parameters are controlled allows testing different methods in the same conditions. The results of this benchmark concerning 16 representative parametric methods showed a great variety of efficiencies. Some methods work very poorly whatever the type of HGTs and some depend on the conditions or on the metrics used. The best methods in terms of total errors were those using tetranucleotides as criterion for the window methods or those using codon usage for gene based methods and the Kullback-Leibler divergence metric. Window methods are very sensitive but less specific and detect badly lone isolated gene. On the other hand gene based methods are often very specific but lack of sensitivity. We propose using two methods in combination to get the best of each category, a gene based one for specificity and a window based one for sensitivity

    PIPS: Pathogenicity Island Prediction Software

    Get PDF
    The adaptability of pathogenic bacteria to hosts is influenced by the genomic plasticity of the bacteria, which can be increased by such mechanisms as horizontal gene transfer. Pathogenicity islands play a major role in this type of gene transfer because they are large, horizontally acquired regions that harbor clusters of virulence genes that mediate the adhesion, colonization, invasion, immune system evasion, and toxigenic properties of the acceptor organism. Currently, pathogenicity islands are mainly identified in silico based on various characteristic features: (1) deviations in codon usage, G+C content or dinucleotide frequency and (2) insertion sequences and/or tRNA genetic flanking regions together with transposase coding genes. Several computational techniques for identifying pathogenicity islands exist. However, most of these techniques are only directed at the detection of horizontally transferred genes and/or the absence of certain genomic regions of the pathogenic bacterium in closely related non-pathogenic species. Here, we present a novel software suite designed for the prediction of pathogenicity islands (pathogenicity island prediction software, or PIPS). In contrast to other existing tools, our approach is capable of utilizing multiple features for pathogenicity island detection in an integrative manner. We show that PIPS provides better accuracy than other available software packages. As an example, we used PIPS to study the veterinary pathogen Corynebacterium pseudotuberculosis, in which we identified seven putative pathogenicity islands

    Sequence of the hyperplastic genome of the naturally competent Thermus scotoductus SA-01

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many strains of <it>Thermus </it>have been isolated from hot environments around the world. <it>Thermus scotoductus </it>SA-01 was isolated from fissure water collected 3.2 km below surface in a South African gold mine. The isolate is capable of dissimilatory iron reduction, growth with oxygen and nitrate as terminal electron acceptors and the ability to reduce a variety of metal ions, including gold, chromate and uranium, was demonstrated. The genomes from two different <it>Thermus thermophilus </it>strains have been completed. This paper represents the completed genome from a second <it>Thermus </it>species - <it>T. scotoductus</it>.</p> <p>Results</p> <p>The genome of <it>Thermus scotoductus </it>SA-01 consists of a chromosome of 2,346,803 bp and a small plasmid which, together are about 11% larger than the <it>Thermus thermophilus </it>genomes. The <it>T. thermophilus </it>megaplasmid genes are part of the <it>T. scotoductus </it>chromosome and extensive rearrangement, deletion of nonessential genes and acquisition of gene islands have occurred, leading to a loss of synteny between the chromosomes of <it>T. scotoductus and T. thermophilus</it>. At least nine large inserts of which seven were identified as alien, were found, the most remarkable being a denitrification cluster and two operons relating to the metabolism of phenolics which appear to have been acquired from <it>Meiothermus ruber</it>. The majority of acquired genes are from closely related species of the Deinococcus-Thermus group, and many of the remaining genes are from microorganisms with a thermophilic or hyperthermophilic lifestyle. The natural competence of <it>Thermus scotoductus </it>was confirmed experimentally as expected as most of the proteins of the natural transformation system of <it>Thermus thermophilus </it>are present. Analysis of the metabolic capabilities revealed an extensive energy metabolism with many aerobic and anaerobic respiratory options. An abundance of sensor histidine kinases, response regulators and transporters for a wide variety of compounds are indicative of an oligotrophic lifestyle.</p> <p>Conclusions</p> <p>The genome of <it>Thermus scotoductus </it>SA-01 shows remarkable plasticity with the loss, acquisition and rearrangement of large portions of its genome compared to <it>Thermus thermophilus</it>. Its ability to naturally take up foreign DNA has helped it adapt rapidly to a subsurface lifestyle in the presence of a dense and diverse population which acted as source of nutrients. The genome of <it>Thermus scotoductus </it>illustrates how rapid adaptation can be achieved by a highly dynamic and plastic genome.</p

    Satellite remote sensing data can be used to model marine microbial metabolite turnover

    Get PDF
    Sampling ecosystems, even at a local scale, at the temporal and spatial resolution necessary to capture natural variability in microbial communities are prohibitively expensive. We extrapolated marine surface microbial community structure and metabolic potential from 72 16S rRNA amplicon and 8 metagenomic observations using remotely sensed environmental parameters to create a system-scale model of marine microbial metabolism for 5904 grid cells (49 km2) in the Western English Chanel, across 3 years of weekly averages. Thirteen environmental variables predicted the relative abundance of 24 bacterial Orders and 1715 unique enzyme-encoding genes that encode turnover of 2893 metabolites. The genes’ predicted relative abundance was highly correlated (Pearson Correlation 0.72, P-value <10−6) with their observed relative abundance in sequenced metagenomes. Predictions of the relative turnover (synthesis or consumption) of CO2 were significantly correlated with observed surface CO2 fugacity. The spatial and temporal variation in the predicted relative abundances of genes coding for cyanase, carbon monoxide and malate dehydrogenase were investigated along with the predicted inter-annual variation in relative consumption or production of ~3000 metabolites forming six significant temporal clusters. These spatiotemporal distributions could possibly be explained by the co-occurrence of anaerobic and aerobic metabolisms associated with localized plankton blooms or sediment resuspension, which facilitate the presence of anaerobic micro-niches. This predictive model provides a general framework for focusing future sampling and experimental design to relate biogeochemical turnover to microbial ecology

    The coral core microbiome identifies rare bacterial taxa as ubiquitous endosymbionts

    Get PDF
    © 2015 International Society for Microbial Ecology All rights reserved. Despite being one of the simplest metazoans, corals harbor some of the most highly diverse and abundant microbial communities. Differentiating core, symbiotic bacteria from this diverse hostassociated consortium is essential for characterizing the functional contributions of bacteria but has not been possible yet. Here we characterize the coral core microbiome and demonstrate clear phylogenetic and functional divisions between the micro-scale, niche habitats within the coral host. In doing so, we discover seven distinct bacterial phylotypes that are universal to the core microbiome of coral species, separated by thousands of kilometres of oceans. The two most abundant phylotypes are co-localized specifically with the corals' endosymbiotic algae and symbiont-containing host cells. These bacterial symbioses likely facilitate the success of the dinoflagellate endosymbiosis with corals in diverse environmental regimes

    Computational Bacterial Genome-Wide Analysis of Phylogenetic Profiles Reveals Potential Virulence Genes of Streptococcus agalactiae

    Get PDF
    The phylogenetic profile of a gene is a reflection of its evolutionary history and can be defined as the differential presence or absence of a gene in a set of reference genomes. It has been employed to facilitate the prediction of gene functions. However, the hypothesis that the application of this concept can also facilitate the discovery of bacterial virulence factors has not been fully examined. In this paper, we test this hypothesis and report a computational pipeline designed to identify previously unknown bacterial virulence genes using group B streptococcus (GBS) as an example. Phylogenetic profiles of all GBS genes across 467 bacterial reference genomes were determined by candidate-against-all BLAST searches,which were then used to identify candidate virulence genes by machine learning models. Evaluation experiments with known GBS virulence genes suggested good functional and model consistency in cross-validation analyses (areas under ROC curve, 0.80 and 0.98 respectively). Inspection of the top-10 genes in each of the 15 virulence functional groups revealed at least 15 (of 119) homologous genes implicated in virulence in other human pathogens but previously unrecognized as potential virulence genes in GBS. Among these highly-ranked genes, many encode hypothetical proteins with possible roles in GBS virulence. Thus, our approach has led to the identification of a set of genes potentially affecting the virulence potential of GBS, which are potential candidates for further in vitro and in vivo investigations. This computational pipeline can also be extended to in silico analysis of virulence determinants of other bacterial pathogens
    • …
    corecore